Conditionalizing on Knowledge
نویسنده
چکیده
A theory of evidential probability is developed from two assumptions: (1) the evidential probability of a proposition is its probability conditional on the total evidence; (2) one's total evidence is one's total knowledge. Evidential probability is distinguished from both subjective and objective probability. Loss as well as gain of evidence is permitted. Evidential probability is embedded within epistemic logic by means of possible worlds semantics for modal logic; this allows a natural theory of higher-order probability to be developed. In particular, it is emphasized that it is sometimes uncertain which propositions are part of one's total evidence; some surprising implications of this fact are drawn out. 1 Evidential probability 2 Uncertain evidence 3 Evidence and knowledge 4 Epistemic accessibility 5 A puzzling'phenomenon Appendix I: Proofs Appendix II: A non-symmetric epistemic model 1 Evidential probability When we give evidence for our theories, the propositions we cite are themselves uncertain. Probabilistic theories of evidence have notorious difficulty in accommodating that obvious fact. This paper embeds the fact in a probabilistic theory of evidence. The analysis of uncertainty leads naturally to a simple theory of higher-order probabilities. The first step is to focus on the relevant notion of probability. Given a scientific hypothesis h, we can intelligibly ask: how probable is h on present evidence? We are asking how much the evidence tells for or against the hypothesis. We are not asking what objective physical chance or frequency of truth h has. A proposed law of nature may be quite improbable on present evidence even though its objective chance of truth is 1 (that the evidence bearing on h may include evidence about objective chances or frequencies is irrelevant). Equally, we are not asking about anyone's actual degree of belief (credence) in h. Present evidence may tell strongly against h, even though everyone is irrationally certain of h. C Oxford University Press 1998 90 Timothy Williamson Is the probability of h on our evidence the credence a perfectly rational being with our evidence would give to hi That suggestion comes closer to what is intended, but not close enough. It fails in the way counterfactuaJ analyses usually fail, by ignoring side-effects of the conditional's antecedent on the truth-value of the analysandum.' For example, to say that the hypothesis that there are no perfectly rational beings is very probable on our evidence is not to say that a perfectly rational being with our evidence would be very confident that there are no perfectly rational beings. To make the point more carefully, let p be a logical truth such that in this imperfect world it is very probable on our evidence that no one has great credence in p. There are such logical truths, although in the nature of the case we cannot be confident that we have identified an example; for all we know, the proposition that Goldbach's Conjecture is a theorem of first-order Peano Arithmetic is one. Let h be the hypothesis that no one has great credence in p. By assumption, h is very probable on our evidence. On the view in question, a perfectly rational being with our evidence would therefore have great credence in A. Since p is a logical truth, h is logically equivalent to the conjunction p&h; since a perfectly rational being would have the same credence in logically equivalent hypotheses, it would have great credence in/7 & h. But that is absurd, for/? & h is of the Moore-paradoxical form 'A and no one has great credence in the proposition that A'; to have great credence in p&h; would be self-defeating and irrational. One can have great credence in a true proposition of that form only by irrationally having greater credence in the conjunction than in its first conjunct. Thus the probability of a hypothesis on our evidence does not always coincide with the credence a perfectly rational being with our evidence would have in it. Thus we cannot use decision theory as a guide to evidential probability. Suppose, for example, that anyone whose credences have distribution P is vulnerable to a Dutch Book. It may follow that the credences of a perfectly rational being would not have distribution P, if a perfectly rational being would not be vulnerable to a Dutch Book, but it would be fallacious to conclude that probabilities on our evidence do not have distribution P, for those probabilities need not coincide with the hypothetical credences of a perfectly rational being. Perhaps only an imperfectly rational being could have exactly our evidence, which includes our evidence about ourselves. The irrationality of distributing 1 Shope [1978]. It is not highly probable on our evidence that no one will ever give high credence to the proposition that Goldbach's Conjecture is a theorem of first-order Peano arithmetic. To eternalize the example, imagine good evidence that nuclear war is about to end all intelligent life. Presumably, a perfectly rational being must give great credence to p, be aware of doing so and therefore give little credence to h and so top & A; but then its evidence about its own states would be different from ours. If so, the hypothesis of a perfectly rational being with our evidence is impossible. For an argument that the subjective Bayesian conception of perfect rationality entails perfect accuracy about one's own credences, see Milne [1991]. Conditionaliiing on Knowledge 91 credence according to the probabilities on one's evidence may simply reflect one's limited rationality, as reflected in one's evidence. But it would be foolish to respond by confining evidential probability to the evidence of a perfectly rational creature. That would largely void the notion of interest; we care about probabilities on our evidence. For all that has been said, any agent with credences that fail to satisfy subjective Bayesian constraints may be eo ipso subject to rational criticism. This would apply in particular to the agent's beliefs about probabilities on its evidence. But it would apply equally to the agent's beliefs about objective physical chances, or anything else. Just as it implies nothing specific about objective physical chances, so it implies nothing specific about probabilities on evidence. What then are probabilities on evidence? We should resist demands for an operational definition; such demands are as damaging in the philosophy of science as they are in science itself. To require mathematicians to give a precise definition of 'set' would be to abolish set theory. Sometimes the best policy is to go ahead and theorize with a vague but powerful notion. One's original intuitive understanding becomes refined as a result, although rarely to the point of a definition in precise pretheoretic terms. That policy will be pursued here. The discussion will assume an initial probability distribution P. P does not represent actual or hypothetical credences. Rather, P measures something like the intrinsic plausibility of hypotheses prior to investigation (this notion of intrinsic plausibility can vary in extension between contexts). P will be assumed to satisfy a standard set of axioms for the probability calculus: P(/?) is a nonnegative real number for every proposition p; P(p) = 1 whenever/? is a logical truth; P(/? V q) = P(p) + P(q) whenever/? is inconsistent with q. \fP{q) > 0, the conditional probability of/? on q, P(p\q), is defined as P(/? & q)IP(q). P{p) is taken to be defined for all propositions; the standard objection that the subject may never have considered p is irrelevant to the non-subjective probability P. But P is not assumed to be syntactically definable. Carnap's programme is moribund. The difference between green and grue is not a formal one. Consider an analogy. The concept of possibility is vague and cannot be defined syntactically. But that does not show that it is spurious. In fact, it is indispensable. Moreover, we know some sharp structural constraints on it: for example, that a disjunction is possible if and only if at least one of its disjuncts is possible. The present suggestion is that probability is in the same boat as possibility, and none the worse for that. On the view to be defended here, the probability of a hypothesis h on total evidence e is P(h\e). An account will be given of when a proposition e constitutes one's total evidence. The best that evidence can do for a hypothesis is to entail it (so P(/i|e) = 1); the worst that evidence can do is to be inconsistent with it (so P(/i|e) = 0). Between those extremes, the initial probability 92 Timothy Williamson distribution provides a continuum of intermediate cases, in which the evidence comes more or less close to requiring or ruling out the hypothesis. The axioms entail that logically equivalent propositions have the same probability on given evidence. The reason is not that a perfectly rational being would have the same credence in them, for the irrelevance of such beings to evidential probability has already been noted. The axioms are not idealizations, false in the real world. Rather, they stipulate what kind of thing we are choosing to study. We are using a notion of probability which (like the notion of incompatibility) is insensitive to differences between logically equivalent propositions. We thereby gain mathematical power and simplicity at the loss of some descriptive detail (e.g. in the epistemology of mathematics): a familiar bargain. The characterization of the prior distribution for evidential probability is blatantly vague. If that seems to disadvantage it with respect to subjective Bayesian credences, which can be more precisely defined in terms of consistent betting behaviour, the contrast in precision disappears in epistemological applications. Given a finite body of evidence e, almost any posterior distribution results from a sufficiently eccentric prior distribution by Bayesian updating on e. Theorems on the 'washing out' of differences between priors by updating on evidence apply only 'in the limit'; they tell us nothing about where we are now. Successful Bayesian treatments of specific epistemological problems (e.g. Hempel's paradox of the ravens) assume that subjects have 'reasonable' prior distributions. We judge a prior distribution reasonable if it complies with our intuitions about the intrinsic plausibility of hypotheses. This is the same sort of vagueness as infects the present approach, if slightly better hidden. One strength of Bayesianism is that the mathematical structure of the probability calculus allows it to make illuminating distinctions that other approaches miss and to provide a qualitatively fine-grained analysis of epistemological problems, given assumptions about all reasonable prior distributions. That strength is common to subjective and objective Bayesianism, for it depends on the structure of the probability calculus. On the present approach, which can be regarded as a form of objective Bayesianism, the axioms of probability theory embody substantive assumptions, as the axioms of set theory do. For example, the restriction of probabilities to real numbers limits the number of gradations in probability to the cardinality of the continuum. Just as the axioms of set theory refine our notion of set, so the axioms of probability theory refine our notion of evidential probability. Those remarks are not intended to smother all doubts about the initial probability distribution. Their aim is to justify the procedure of tentatively postulating such a distribution, in order to see what use can be made of * See Earman [1992], pp. 137-61 for a sophisticated discussion. Conditionalizing on Knowledge 93 it in developing a theory of evidential probability. That is the focus of this paper. 2 Uncertain evidence Suppose that evidential probabilities are indeed probabilities conditional on the evidence. Then, trivially, the evidence itself has evidential probability 1. P(e\e)= 1 whenever it is defined. Does this require evidence to be absolutely certain? If so, how can evidential probabilities fit into a non-Cartesian epistemology? Given that evidential probabilities are probabilities conditional on the evidence, one cannot avoid attributing evidential probability 1 to the evidence by denying that evidence is propositional, for the probabilities of hypotheses conditional on the evidence are defined only if evidence is propositional. P(/i|e) = P(/i&e;)/P(e); this equation makes sense only if e is propositional. Since the present approach identifies evidential probabilities with probabilities conditional on the evidence, it is committed to treating evidence as propositional. We should question the association between evidential probability 1 and absolute certainty. For subjective Bayesians, probability 1 is the highest possible degree of belief, which presumably is absolute certainty. If one's credence in p is 1, one should be willing to accept a bet on which one gains a penny if p is true and is tortured horribly to death if p is false. Few propositions pass that test. But since evidential probabilities are not actual or counterfactual credences, why should evidential probability 1 entail absolute certainty? There is a further link between probability 1 and certainty. Bayesian accounts of learning from experience give a significance to probability 1 which does not depend on any identification of probabilities with actual or counterfactual credences. Suppose that the new evidence gained on some occasion is e. On the standard Bayesian account of this simple case, probabilities should be updated by conditionalization on e. The updated unconditional probability of p is its previous probability conditional on e: BCOND PKW(p) = Poid(p\e) = Pold(p & e)/Pold(e) (Pold(e) * 0) We can interpret BCOND as a claim about evidential probabilities. Note that Pold is not absolutely prior probability P, but probability on all the evidence gained prior to e. Suppose further, as Bayesians often do, that such conditionalization is the only form of updating which the probabilities undergo. By BCOND, Pncw(e) = 1 • When PneW is updated to P^w by conditionalization on 5 No attempt will be made to survey the non-Bayesian theories of evidential probability in the literature. See e.g. Kyburg [1974] and PLantinga [1993]. 6 Williamson [1997] defends the assumption that evidence is propositional. 94 Timothy Williamson still newer evidence / , PVKWW = P ^ e l f) — P«w(e &/) / P«w(/) = 1 whenever conditionalization on/is defined. Thus e will retain probability 1 through all further conditionalizations; since no other form of updating is contemplated, e will retain probability 1. Once a proposition has been evidence, its status is as good as evidence ever after, probability 1 is a lifetime's commitment. On this model of updating, when a proposition becomes evidence it acquires an epistemically privileged feature which it cannot subsequently lose. How can that be? Surely any proposition learnt from experience can in principle be epistemically undermined by further experience. What propositions could attain that unassailable epistemic status? Science treats as evidence propositions such as ' 13 of the 20 rats injected with the drug died within 24 hours'; one may discover tomorrow that a disaffected laboratory technician had substituted dead rats for living ones. The Cartesian move is to find certainty in propositions about one's own current mental state ('I seem to see a dead rat'; 'My current degree of belief that 13 of the 20 rats died is 0.97'). Arguably, we are fallible even about our own current mental states. But even if that point is waived, and we are assumed to be infallible about a mental state when we are in it, we do not remain infallible about it later. However certain I am today of the proposition that I now express by the sentence 'I seem to see a dead rat', I may be uncertain tomorrow of the same proposition, then expressed by the sentence 'Yesterday I seemed to see a dead rat'; I can wonder whether I really remember seeming to see a dead rat, or only imagine it. We are uncontroversially fallible about our own past mental states. We are likewise fallible about the mental states of others. You can doubt whether I seem to myself to see a dead rat; even if I tell you that I seem to myself to see one, you may wonder whether I am lying. Yet science relies on intersubjectively available evidence. Even Bayesian epistemologists assume that evidence is intersubjectively available. Consider, for instance, the arguments that individual differences between prior probability distributions are 'washed out' in the long run by conditionalization on accumulating evidence; they typically assume that different individuals are conditionalizing on the same evidence. 7 Williamson [1996a]. 8 Pcriiaps 'I seem to see a dead rat' (uttered by me today) and 'Yesterday I seemed to see a dead rat' (uttered by me tomorrow) do not express exactly the same proposition. But if I can think tomorrow the proposition expressed by 'I seem to see a dead rat' (uttered by me today), then that proposition can become uncertain for me; if I cannot even think it tomorrow, then the problem is even worse, because I cannot retain my evidence. ' In some cases it can be shown that, although our evidence is different, our beliefs will almost certainly converge on each other because they will almost certainly converge on the truth. For example, if a bag contains ten red or black balls, and we take it in turns to draw a ball with replacement, each observing our own draws and not the other's, and conditionalizing on the results, our posterior probabilities for the number of balls in the bag will almost certainly converge to the same values, even if our prior probabilities are quite different (if we both assign nonzero prior probabilities to all eleven possibilities). But even this assumes that OUT evidence consists of true propositions about the results of the draws, not propositions about our mental states; where does that assumption come from, on a subjective Bayesian view? Conditionalizing on Knowledge 95 If we start with different prior probabilities, and I conditionalize on evidence about my mental state while you conditionalize on evidence about your mental state, our posterior probabilities need not converge. The point generalizes. It is tempting to make a proposition p certain for a subject S at a time / by attributing a special authority to S's belief at t in p. But then belief in p by other subjects or at other times has a special lack of authority, because it is trumped by S's belief at /. For example, to the extent to which eyewitness reports of an event have a special status, non-eyewitness reports are vulnerable to being overturned by them. Thus it is hard to see how any empirical proposition could have the intertemporal and intersubjective certainty which the conditionalization account demands of evidence. The standard response is to generalize Bayesian conditionalization to Jeffrey conditionalization (probability kinematics). For a proposition p, in Bayesian conditionalization on e (0 < Poid(e) < 1): 0) Pou(p) = Poki(e)Poid(/>k) + Poid(~l~e) (ii) Pnew(P) = Pnew WPoldO>|e) + P«w(~e)PoldO>l ~«) For BCOND, the weights P^wO) and PneW(~e) in (ii) are 1 and 0 respectively. Probabilities conditional on e are unchanged (Pnew(p|«) = Poid(/>k))What has changed is their weight in determining unconditional probabilities; it has increased from Poid(e) to 1. But when experience makes e more probable without making it certain, Jeffrey conditionalization allows us to retain (ii) ((i) is automatic) and make Pnew(e) larger than Poid(«) without making it 1. This increases the weight of probabilities conditional on e at the expense of probabilities conditional on ~e, while giving some weight to both. More generally, experience may cause us to redistribute probability amongst various possibilities, whilst leaving probabilities conditional on those possibilities fixed. Let {ei,. . . , en) be a partition (i.e. as a matter of logic, exactly one proposition in the set is true) such that Poid(«() > 0 for 1 < i s n. 10 Then Pnew comes from Pold by Jeffrey conditionalization with respect to {e\,... ,en) just in case every proposition p satisfies: JCOND Pnew(p) = Y, P»ewfo)PoidO>k() Bayesian conditionalization is just the special case where {e\, ..., en} = {e, ~e}andP D e w (e)=l . Jeffrey conditionalization cannot reduce probabilities from 1. If Poid(/>) = 1 then Pnew(p) = 1 by JCOND. The idea is rather that no empirical proposition For mathematical simplicity, infinite partitions are ignored. 96 Timothy Williamson need acquire probability 1 when one learns from experience. On the approach of this paper, by contrast, evidence must have evidential probability 1, and some empirical propositions must be evidence if evidential probabilities are ever to change. Should the present approach be modified to permit Jeffrey conditionalization? The updating of evidential probability by Jeffrey conditionalization is hard to integrate with any adequate epistemology, because we have no substantive answer to the question: what should the new weights Pnewfo) be? Indeed, if sufficiently fine partitions are used, any probability distribution Pmv/ is the outcome of any probability distribution Pom by JCOND, provided only that PDew(/j) = 1 whenever Poia(p) — 1 a d the set of relevant propositions is finite. Arguably, the same applies to BCOND. But there is a simple schematic answer to the epistemological question 'Which instances of BCOND update evidential probability?': those in which e is one's new evidence. Although that answer immediately raises the further question 'What is one's evidence?', it still constitutes progress, for it divides the theoretical labour, allowing other work in epistemology and in philosophy of science to provide Bayesianism with its theory of evidence. To the parallel question 'Which instances of JCOND update evidential probability?', no such simple answer will do. Jeffrey conditionalization is not conditionalization on evidence-constituting propositions. Moreover, the weights P(e,-) are highly sensitive to background knowledge. When I see a cloth by candlelight, the new 11 Proof: Let the propositions of interest be pi, . . . ,/?„. Each of the 2TM possible distributions of truthvaJues to them corresponds to a conjunction of p, or ~p, for I s / S m . These 2TM conjunctions form a partition. Perhaps PouC?) = 0 for some such conjunction g; by disjoining each such g with a conjunction / such that Pou(/)>0, form a partition {«,,..., «„) such that Po)d(e<)>0 for I S i S n . Each et is equivalent to a disjunction// V g,, where PouC//) > 0, Poutef) = 0. and// either entails pj or entails ~pj (\^j£m). Since Pou(£/) = 0, standard reasoning shows that P^ei fi ) = 1, so Pouipj I e<) = PokKPy I/;)Thus Pou(.Pj I «i) is 1 or 0, depending on whether// entails pj or ~pj. Now suppose that for every proposition q, if P<*Aq) = 1 then Pnew(0thenPnew(/7/|ej) is 1 or 0, depending on whether/, entailspt or ~pj. Thus if P^efr i= 0, P«w(p/1«/) = Poid(P/1«/) (1 ̂ ' ^ n). From this, JCOND is a routine corollary, with any pj in place of p. 12 It depends on whether one can introduce finer distinctions than those made by the propositions of interest If not, and only two possibilities can be distinguished, then no Bayesian conditionalization can change Pold to P,^,, where 0 0 such that for all p for which the probabilities are defined, c P o ^ p ) £ Pou(p)Introduce a new proposition /, bifurcating ti into £ (&/and «/&~/. Determine a probability distribution P. for the refined partition by: P.( 0) ECOND formalizes PROPOSITIONALITY. It allows MONOTONICITY to fail, for if one forgets something between t and a later time t*, being in circumstances w and w* at t and t* respectively, then ew* need not entail ew, so possibly Y*wJiew) < 1 even though Pw(ew) = 1. Thus a proposition can decrease in probability from 1. In that sense, evidence need not be certain. When no evidence is lost between w and w*, ew. is equivalent to ew&f;, where / is the conjunction of the new evidence gained in that interval, and ECOND implies that Pw . results from conditionalizing P*, on the new evidence /. Formally, for any proposition p: PW(P) = P{p&ew;&f;)/P(ew&f;) = (P(p&ew;&f;)/P(ew))/(P(ew&f;)/P(ew)) = Pw(p&f;)/Pw(f) = BCOND is the special case of ECOND when evidence is cumulative. Thus Bayesian conditionalization can be recovered when needed. The distribution P is conceptually rather than temporally prior; it need not coincide with P*, for any circumstance w that some subject is in at some time, for P is not a distribution of credences, and the subject may have non-trivial evidence at every time. An incidental advantage of this approach is that it helps with the problem of old evidence. One would like to say that e confirms h just in case the conditional probability of h on e is higher than the unconditional probability of h. If e is already part of the evidence then its probability is 1, and the conditional probabilities are identical; yet old evidence does sometimes confirm hypotheses. Appeals are sometimes made to probabilities in past or counterfactual circumstances in which the evidence does not include e, but they produce anomalous results, because the evidence in those circumstances may be distorted by irrelevant factors. Example: A coin is tossed ten times. Let h be the hypothesis that it landed the same way each time. The initial probability of h is 1/2. Witness A says 'I saw 17 Compare the notion of a diary in Skyrms [1983]. Skyrms's discussion concentrates on the problem of memory storage, but remembering which propositions are evidence is no worse than remembering a probability for each proposition. Of course, it is often rational to retain a belief even when one has forgotten one's past evidence for it. In some cases the belief itself has attained the stanis of evidence (see Section 3); in others one has only indirect evidence for it (e.g. one seems to remember p and is usually right about such things). But even those beliefs are evidentially probable at (only if one's evidence at I supports them. See Harman [ 1986] for much relevant discussion of clutter avoidance. 18 See Glymour [1980], pp. 85-93, Earman [1992], pp. 119-35, Howson and Urbach [1993], pp. 403-8, and Maher [19%]. 100 Timothy Williamson the first six tosses; it landed heads each time'. Witness B then says 'I saw the last four tosses; it landed tails each time'; let e be the proposition that B said this. We have no reason to doubt A and B; if they are both telling the truth, h is false. But B's statement causes A to break down; he admits that he was lying, and has no relevant knowledge. If B had not made his statement, A would not have withdrawn his, and there would have been no reason to suspect that he was lying. Thus in the nearest past or counterfactual circumstances in which e was not part of our evidence, the conditional evidential probability of h on e is lower than the unconditional evidential probability of h. Nevertheless, in our present situation, e does confirm h, for since we still have no reason to doubt B, the probability of h on our evidence is around 1/2. Once we have the prior probability distribution P, we can say that P(h\e) > P(/i). Of course, these remarks are schematic, but at least the general form of the solution does not introduce the irrelevant complications consequent on an identification of the probabilities with past or counterfactual credences. 3 Evidence and knowledge Which propositions are one's evidence? Without a substantive conception of evidence, probabilistic epistemology is empty; in practice, it has taken the existence of such a conception for granted without itself supplying one. Different conceptions of evidence are compatible with ECOND. A simple, natural proposal is that one's evidence is one's body of knowledge. More precisely, the total evidence ew of an individual or community S in a circumstance w is the conjunction of all the propositions S knows in w. Call that equation E = K. Since evidence can lose probability 1, the defeasibility of knowledge by later evidence is no objection to E = K. When I see the black ball put into the bag, the proposition that a black ball was put into the bag becomes part of my evidence because I know that a black ball was put into the bag. When I have seen a red ball drawn each time on the first ten thousand draws, that further evidence undermines my knowledge that a black ball was put into the bag, and the previously known proposition ceases to be part of my evidence. Since only true propositions are known, evidence consists entirely of true propositions, but one true proposition can cast doubt on another. Subjective Bayesians might identify one's evidence with one's beliefs (understood as propositions of subjective probability 1) rather than with 19 We can relalivize confirmation to background information/by requiring that P(A|« & / ) > P(h\f), but this does not justify subjecting it to the vagaries of the evidence we once or would have had. 2 0 For a general defence of E = K, see Williamson [1997]. Compare Maher [1996], according to which all evidence is knowledge but not all knowledge is evidence. Restrictive views of evidence can make unnecessary problems for condiu'onalization by not allowing propositions about the subject's updated belief state to count as part of the new evidence; this may explain the cases discussed in Howson [1996] and Castell [1996]. Conditionalizing on Knowledge 101 one's knowledge (E = B). Given E = B, one can manufacture evidence for one's favourite theories simply by getting oneself into a state of certainty about appropriate propositions (e.g. that one has just seen one's guru perform a miracle). That does not capture the spirit of the injunction to proportion one's belief to one's evidence. No positive argument will be developed here for E — K. The rest of the paper develops the conjunction of E = K with ECOND as a theory of evidential probabilities, in a way which indicates at least their mutual coherence. The concept of knowledge is sometimes regarded as a kind of survival from stone age thinking, to be replaced by probabilistic concepts for the purposes of serious twentieth-century epistemology. That view assumes that the probabilistic concepts do not depend on the concept of knowledge. If E = K and ECOND are true, that assumption is false. The concepts of knowledge and evidential probability are complementary; neither can replace the other. Some initially surprising results of the theory stem from the fact that we are not always in a position to know whether we know something; by E = K, we are not always in a position to know whether something is part of our evidence. This consequence is independently plausible. Whether something is part of our evidence does not depend solely on whether we believe it to be part of our evidence. That/? is part of our evidence is a non-trivial condition; arguably, no non-trivial condition is such that whenever it obtains one is in a position to know that it obtains. But if we are not always in a position to know whether something is part of our evidence, how can we use evidence? We shall sometimes not be in a position to know the probability of a proposition on our evidence. How then can we follow the rule 'Proportion your belief in a proposition to its probability on your evidence'? There is a recurrent temptation to suppose that we can follow a rule only if it is always cognitively transparent to us whether we are complying with it. On this view, if we are sometimes not in a position to know whether we are $ing when C, then we cannot follow the rule '4> when C ; at best we can follow the rule 'Do what appears to you to be 4>\ng when it appears to you that C . For instance, we cannot follow the rule 'Add salt when the water boils' because we are not always in a position to know whether something is really salt, water or boiling; at best we can follow the rule 'Do what appears to you to be adding salt when what appears to you to be water appears to you to boil'. Can we even follow the modified rule? That something appears to us to be so is itself a nontrivial condition. But we can follow the rule 'Add salt when the water boils', 2 1 Here is one argument. The rules of assertion permit one to assert p outright if and only if one knows p (Williamson [1996b]). One's evidence consists of just the propositions the rules of assertion permit one to assert outright Therefore one's evidence consists of just the propositions one knows. The premises of this argument are scarcely uncontroversial. 2 2 Williamson [1996a]. 102 Timothy Williamson even though we occasionally make mistakes in doing so. It is enough that we often know whether the condition obtains. Compliance with a non-trivial rule is never a perfectly transparent condition. We use rules about evidence for our beliefs because they are often less opaque than rules about the truth of our beliefs; perfect transparency is neither possible nor necessary. Just as we can follow the rule 'Add salt when the water boils', so we can follow the rule 'Proportion your belief in a proposition to its probability on your evidence'. Although we are sometimes reasonably mistaken or uncertain as to what our evidence is and how probable a proposition is on it, we often enough know enough about both to be able to follow the rule. It is easier to follow than 'Believe a proposition if it is true', but not perfectly easy. And just as adding salt when the water boils is not equivalent to doing one's rational utmost to add salt when the water boils, so proportioning one's belief in a proposition to its probability on one's evidence is not equivalent to doing one's rational utmost to proportion one's belief in a proposition to its probability on one's evidence. The content of a rule cannot be reduced to what it is rational to do in attempting to comply with it. Evidential probabilities are not rational credences. The next task is to develop a formal framework for the combination of E = K with ECOND, by appropriating some ideas from epistemic logic. Within this framework, the failure of cognitive transparency for evidential probabilities will receive a formal analysis. 4 Epistemic accessibility Start with a set of mutually exclusive and jointly exhaustive circumstances. Each circumstance will be required to answer the question 'What evidence have I?', and so will need to specify a subject (to interpret 'I') and a time (to interpret the present tense of 'have'). Thus a circumstance is not an ordinary possible world; my circumstances and yours today and yesterday are four distinct circumstances in the actual world. Circumstances are more like centred worlds. In a given application, circumstances need be specific only in relevant respects (a set of all circumstances is assumed). The relevant propositions are true or false in each circumstance, and closed under truthfunctional combinations. Assume that for each set of circumstances, some proposition is true in every circumstance in the set and false in every other circumstance. The truth of a proposition may vary with the subject or time of the circumstance, even if all other features are fixed. Thus the proposition that 23 The application of moda] logical techniques to epistemological problems was pioneered in Hintikka [ 1962], although the assumptions made here differ from Hintikka' s. A good text for the modal logical background is Hughes and Cresswell [1996]. 24 In some applications 'we ' will replace T . Conditionalizing on Knowledge 103 one is sitting may be true in my actual circumstance now and false in my actual circumstance a minute ago. Let P be a prior probability distribution as in section 1. P is assumed to satisfy the axioms of the probability calculus as stated in terms of circumstances. Thus P(/?) = 1 whenever p is true in every circumstance; P(p V q) = P(p) + ?(q) whenever p and q are in no circumstance both true. Consequently, if p and q are true in exactly the same circumstances, P(p) = P( 0 for each w, so ECOND defines evidential probabilities everywhere. Regularity also entails that the evidential probability of p is 1 only if p follows from one's evidence, for if p is false in some circumstance in which ew is true, then P(~p&ew;)> 0, so Pulp) < 1. Regularity likewise entails that/? follows from what one knows if and only if the evidential probability of p is 1, and that p is consistent with what is known if and only if the evidential probability of p is nonzero. Propositions about evidential probability are themselves true or false in circumstances. For example, the proposition that p is more probable than not on the evidence is true in w just in case Pw(p) > j . 2 8 Thus propositions about evidential probability themselves have probabilities. In the manner of possible worlds semantics, conditions on accessibility correspond to conditions on knowledge, which in turn have implications for evidential probabilities. For example, accessibility is transitive just in case for every proposition p in every circumstance, if p follows from what one knows then that p follows from what one knows itself follows from what one knows (compare the S4 axiom Op r> DDp in modal logic). The latter condition follows from the notorious 'KK' principle that when one knows p, one knows that one knows p; it is slightly weaker, but not in ways which should make it much less controversial. For a regular probability distribution, transitivity is equivalent to the condition that when/? has evidential probability 1, the proposition that p has evidential probability 1 itself has evidential probability 1. Accessibility is symmetric just in case for every proposition p in every circumstance, if p is true then that/7 is consistent with what one knows follows 27 When there are uncountably many circumstances, no probability distribution is regular (infinitesimal probabilities are not being considered here). 28 See Williamson [1992], pp. 2 3 8 9 and [1994], pp. 2 4 5 7 for exploration of some simple models based on this idea. 29 See Skyrms [1980] andGaifman [1988] for good discussions of higher-order probability. Their subjectivism introduces complications into their accounts, e.g. Gaifman needs a distinction between the agent 's probability and a hypothetical expert 's probability to handle higher-order probability. These complications are unnecessary from the present perspective. 30 Williamson [1992]. Conditionaliiing on Knowledge 105 from what one knows (compare the Brouwerian axiom p z> DO/>). For a regular probability distribution, symmetry is equivalent to the condition that when/7 is true, the the proposition that/? has nonzero evidential probability has evidential probability 1. There is good reason to doubt that accessibility is symmetric. Let x be a circumstance in which one has ordinary perceptual knowledge that the ball taken from the bag is black. In some circumstance w, the ball taken from the bag is red, but freak lighting conditions cause it to look black, and everything one knows is consistent with the hypothesis that one is in x. Thus x is accessible from w, because every proposition one knows in w is true in x; but w is not accessible from x, because the proposition that the ball taken from the bag is black, which one knows in x, is false in w. Let p be the proposition that the ball taken from the bag is red. In w, p is true, but that p is consistent with what one knows does not follow from what one knows, for what one knows is consistent with the hypothesis that one knows ~p. On a regular probability distribution, the evidential probability in w of the proposition thatp has nonzero evidential probability falls short of 1. Such examples depend on less than Cartesian standards for knowledge and evidence; Bayesian epistemology must learn to live with such standards. Moreover, failures of symmetry can result from processing constraints, even when false beliefs are not at issue. For a crude example, imagine a creature that knows all the propositions recorded in its memory; pretend that it is somehow physically impossible for false propositions to be recorded there. Unfortunately, there is no limit to the time taken to deliver propositions from memory to the creature's central processing unit. Now toadstools are in fact poisonous for the creature, but it has no memory of any proposition relevant to this fact. It wonders whether it knows that toadstools are not poisonous. It searches for relevant memories. At any time, it has recovered no relevant memory, but for all it knows that is merely because the delivery procedure is slow, and in a moment the memory that toadstools are not poisonous will be delivered, in which case it will have known all along that they are not poisonous. Everything it knows in the actual circumstance w is true in a circumstance x in which it knows that toadstools are not poisonous; thus x is accessible from w. But w is not accessible from x, because something it knows in x (that toadstools are not poisonous) is false in w. Although in w the proposition p that toadstools are poisonous is true, that p is consistent with what it knows does not itself follow from what it knows. Epistemic logic and probability theory are happily married because the posterior probabilities in w result from conditionalizing on the set of circumstances epistemically accessible from w. This idea has become familiar in 31 See Humberstone [1988] for related issues. 3 2 See also Shin and Williamson [1994]. 106 Timothy Williamson decision theory in the context of standard treatments of the concept of common knowledge. As usual, the proposition that p is common knowledge is analysed as the infinite conjunction ofp, the proposition that everyone knows/?, the proposition that everyone knows that everyone knows p, and so on; thus the analysis of common knowledge requires an account of knowledge. Something like the framework above is used, with a separate accessibility relation Ra for each agent a but a common prior probability distribution; different agents can have different posterior probabilities in the same circumstance because they have different sets of accessible circumstances to conditionalize on. 'a knows p' (K^p) is given the semantics of 'p follows from what one knows' with respect to the accessibility relation R^ thus knowledge is treated as closed under logical consequence (contrast the present account). Furthermore, in decision theory accessibility is usually required to be an equivalence relation (symmetric and transitive as well as reflexive) for each agent. On this model, the agent partitions the set of circumstances into a set of mutually exclusive and jointly exhaustive sets; in w, the agent knows just those propositions true in every circumstance belonging to the same member of the partition as w. Informally, imagine that each circumstance presents a particular appearance to the agent, who knows all about appearances and nothing more; thus one circumstance is epistemically accessible from another if and only if they have exactly the same appearance, which is an equivalence relation. The corresponding prepositional logic of knowledge is the modal system S5, with K« in place of D; one can axiomatize it by taking as axioms all truth-functional tautologies and formulas of the forms K^A 3 B) 3 (K^A 3 K^B), K^A 3 A and ~K«A 3 Ka~KaA, and as rules of inference modus ponens and epistemization (if A is a theorem, so is K,,A). One of the earliest results to be proved on the basis of assumptions tantamount to these was Aumann's '[no] agreeing to disagree' theorem: when the posterior probabilities of p for two agents are common knowledge, they are identical. Earlier examples expose some of the idealizations implicit in the partitional model of knowledge. In particular, the counterexamples to the symmetry of accessibility, and so to the Brouwerian schema ~A 3 Ka~KaA, are equally counterexamples to the S5 schema ^-K^A 3 Ka~K t tA, given the uncontentious principle that knowledge implies truth (KoA 3 A). Some progress has been made in generalizing results such as Aumann's to weaker assumptions about knowledge. It can be argued that, even when logical omniscience is assumed, the propositional logic of knowledge is not S5 but the modal system KT (alias T), which one can axiomatize by dropping the axiom schema ~KoA 3 Ka~KCTA 33 Game-theoretic work on common knowledge uses the framework described; see Fudenberg and Tirole [1991], pp. 5 4 1 7 2 . 34 Aumann [1976]; the proof relies heavily on the assumption of common prior probabilities. 33 Bacharach [1985], Geanakoplos [1989], Samet [1990], Shin [1993], Basu [1996]. Conditionalizing on Knowledge 107 from the axiomatization above. What KT assumes about knowledge, in addition to logical omniscience, is just that knowledge implies truth. When KaA is read as 'it follows from what one knows that A', rather than as 'One knows that A' (where a is the agent of the circumstance), logical omniscience becomes unproblematic for K^, whatever a 's logical imperfections. The exposition of the present theory of probabilities on evidence has now been completed, and some readers may wish to stop at this point. However, deviations from the partitional model generate a phenomenon which seems to threaten the proposed marriage of knowledge and probability. The aim of the final section is to understand that phenomenon. 5 A puzzling phenomenon The paradoxical phenomenon can be illustrated thus. There are just three circumstances: wlt w2, and x. As in the diagram, x is accessible from each circumstance; each of w\ and w2 is accessible only from itself.
منابع مشابه
Skyrms Three Ways to Give a Probability Assignment a Memory
Consider a model of learning in which we update our probability assignments by conditionalization; i.e., upon learning S, the probability of not-S is set at zero and the probabilities of statements entailing S are increased by a factor of one over the initial probability of S. In such a model, there is a certain peculiar sense in which we lose information every time we learn something. That is,...
متن کاملUpdating as Communication
On many traditional theories of belief, your belief state is represented by an assignment of credences to propositions, or sets of possible worlds. If you are rational, your credence distribution will be a probability measure. Traditional theories of belief fit with a standard Bayesian theory of rational belief change: on learning a proposition, you must update your belief state by conditionali...
متن کاملThe modal logic of Bayesian belief revision
In Bayesian belief revision a Bayesian agent revises his prior belief by conditionalizing the prior on some evidence using Bayes’ rule. We define a hierarchy of modal logics that capture the logical features of Bayesian belief revision. Elements in the hierarchy are distinguished by the cardinality of the set of elementary propositions on which the agent’s prior is defined. The containment rela...
متن کاملReducing belief simpliciter to degrees of belief
We prove that given reasonable assumptions, it is possible to give an explicit definition of belief simpliciter in terms of subjective probability, such that it is neither the case that belief is stripped of any of its usual logical properties, nor is it the case that believed propositions are bound to have probability 1. Belief simpliciter is not to be eliminated in favour of degrees of belief...
متن کاملEvidential Externalism
This is a different question. Call this the Evidence Question. The EvidenceFor Question has received significant attention. Debates in the Bayesian literature about the correct confirmation function, for instance, are aimed at the answering the Evidence-For Question.1 The Evidence Question is somewhat less investigated.2 Nevertheless, it is important. According to prominent epistemological view...
متن کاملThe Conditionalizing Identity Management Bayesian Filter (CIMBal)
We present a large-scale data association tracker that can handle variable numbers of world objects and measurements. Large-scale data association problems arise in surveillance, wildlife monitoring, and applications of sensor networks. Several approaches have recently been proposed that represent the uncertainty in data association using a parameterized family of distributions on the set of pe...
متن کامل